智能论文笔记

Formalization of the principles of brain Programming (Brain Principles Programming)

E. E. Vityaev , A. G. Kolonin , A. V. Kurpatov A. A. Molchanov

分类：人工智能

2022-05-13

在专着“强大的人工智能。关于超级智能的方法”中包含通用人工智能（AGI）的概述。作为拟人化研究领域，它包括大脑原理编程（BPP） - 大脑的普遍机制（原理）的形式化，并在神经组织组织的各个层面上实施。该专着在类别理论方面包含了这些原则的形式化。但是，这种形式化不足以开发用于使用信息的算法。在本文中，对于BPP的描述和建模，建议采用较早开发的数学模型和算法，该模型和算法对认知功能进行了建模，并基于众所周知的生理，心理和其他自然科学理论。本文使用以下理论的数学模型和算法：P.K.Anokhin功能性脑系统理论，Eleanor Rosch原型分类理论，Bob Rehder因果模型和“自然”分类。结果，获得了BPP的形式化，并提供了证明算法运行的计算机实验。

translated by 谷歌翻译

A Fine-Grained Vehicle Detection (FGVD) Dataset for Unconstrained Roads

Prafful Kumar Khoba , Chirag Parikh , Rohit Saluja , Ravi Kiran Sarvadevabhatla , C. V. Jawahar

分类：计算机视觉

2022-12-30

The previous fine-grained datasets mainly focus on classification and are often captured in a controlled setup, with the camera focusing on the objects. We introduce the first Fine-Grained Vehicle Detection (FGVD) dataset in the wild, captured from a moving camera mounted on a car. It contains 5502 scene images with 210 unique fine-grained labels of multiple vehicle types organized in a three-level hierarchy. While previous classification datasets also include makes for different kinds of cars, the FGVD dataset introduces new class labels for categorizing two-wheelers, autorickshaws, and trucks. The FGVD dataset is challenging as it has vehicles in complex traffic scenarios with intra-class and inter-class variations in types, scale, pose, occlusion, and lighting conditions. The current object detectors like yolov5 and faster RCNN perform poorly on our dataset due to a lack of hierarchical modeling. Along with providing baseline results for existing object detectors on FGVD Dataset, we also present the results of a combination of an existing detector and the recent Hierarchical Residual Network (HRN) classifier for the FGVD task. Finally, we show that FGVD vehicle images are the most challenging to classify among the fine-grained datasets.

translated by 谷歌翻译

Artificial Intelligence to Enhance Mission Science Output for In-situ Observations: Dealing with the Sparse Data Challenge

M. I. Sitnov , G. K. Stephens , V. G. Merkin , C. -P. Wang , D. Turner , K. Genestreti , M. Argall , T. Y. Chen , A. Y. Ukhorskiy , S. Wing

分类：机器学习

2022-12-26

In the Earth's magnetosphere, there are fewer than a dozen dedicated probes beyond low-Earth orbit making in-situ observations at any given time. As a result, we poorly understand its global structure and evolution, the mechanisms of its main activity processes, magnetic storms, and substorms. New Artificial Intelligence (AI) methods, including machine learning, data mining, and data assimilation, as well as new AI-enabled missions will need to be developed to meet this Sparse Data challenge.

translated by 谷歌翻译

Naamapadam: A Large-Scale Named Entity Annotated Data for Indic Languages

Arnav Mhaske , Harshit Kedia , Sumanth Doddapaneni , Mitesh M. Khapra , Pratyush Kumar , Rudra Murthy V , Anoop Kunchukuttan

分类：自然语言处理

2022-12-20

We present, Naamapadam, the largest publicly available Named Entity Recognition (NER) dataset for the 11 major Indian languages from two language families. In each language, it contains more than 400k sentences annotated with a total of at least 100k entities from three standard entity categories (Person, Location and Organization) for 9 out of the 11 languages. The training dataset has been automatically created from the Samanantar parallel corpus by projecting automatically tagged entities from an English sentence to the corresponding Indian language sentence. We also create manually annotated testsets for 8 languages containing approximately 1000 sentences per language. We demonstrate the utility of the obtained dataset on existing testsets and the Naamapadam-test data for 8 Indic languages. We also release IndicNER, a multilingual mBERT model fine-tuned on the Naamapadam training set. IndicNER achieves the best F1 on the Naamapadam-test set compared to an mBERT model fine-tuned on existing datasets. IndicNER achieves an F1 score of more than 80 for 7 out of 11 Indic languages. The dataset and models are available under open-source licenses at https://ai4bharat.iitm.ac.in/naamapadam.

translated by 谷歌翻译

Interpretable ML for Imbalanced Data

Damien A. Dablain , Colin Bellinger , Bartosz Krawczyk , David W. Aha , Nitesh V. Chawla

分类：机器学习

2022-12-15

Deep learning models are being increasingly applied to imbalanced data in high stakes fields such as medicine, autonomous driving, and intelligence analysis. Imbalanced data compounds the black-box nature of deep networks because the relationships between classes may be highly skewed and unclear. This can reduce trust by model users and hamper the progress of developers of imbalanced learning algorithms. Existing methods that investigate imbalanced data complexity are geared toward binary classification, shallow learning models and low dimensional data. In addition, current eXplainable Artificial Intelligence (XAI) techniques mainly focus on converting opaque deep learning models into simpler models (e.g., decision trees) or mapping predictions for specific instances to inputs, instead of examining global data properties and complexities. Therefore, there is a need for a framework that is tailored to modern deep networks, that incorporates large, high dimensional, multi-class datasets, and uncovers data complexities commonly found in imbalanced data (e.g., class overlap, sub-concepts, and outlier instances). We propose a set of techniques that can be used by both deep learning model users to identify, visualize and understand class prototypes, sub-concepts and outlier instances; and by imbalanced learning algorithm developers to detect features and class exemplars that are key to model performance. Our framework also identifies instances that reside on the border of class decision boundaries, which can carry highly discriminative information. Unlike many existing XAI techniques which map model decisions to gray-scale pixel locations, we use saliency through back-propagation to identify and aggregate image color bands across entire classes. Our framework is publicly available at \url{https://github.com/dd1github/XAI_for_Imbalanced_Learning}

translated by 谷歌翻译

PulseImpute: A Novel Benchmark Task for Pulsative Physiological Signal Imputation

Maxwell A. Xu , Alexander Moreno , Supriya Nagesh , V. Burak Aydemir , David W. Wetter , Santosh Kumar , James M. Rehg

分类：机器学习 | 人工智能

2022-12-14

The promise of Mobile Health (mHealth) is the ability to use wearable sensors to monitor participant physiology at high frequencies during daily life to enable temporally-precise health interventions. However, a major challenge is frequent missing data. Despite a rich imputation literature, existing techniques are ineffective for the pulsative signals which comprise many mHealth applications, and a lack of available datasets has stymied progress. We address this gap with PulseImpute, the first large-scale pulsative signal imputation challenge which includes realistic mHealth missingness models, an extensive set of baselines, and clinically-relevant downstream tasks. Our baseline models include a novel transformer-based architecture designed to exploit the structure of pulsative signals. We hope that PulseImpute will enable the ML community to tackle this significant and challenging task.

translated by 谷歌翻译

FAIR AI Models in High Energy Physics

Javier Duarte , Haoyang Li , Avik Roy , Ruike Zhu , E. A. Huerta , Daniel Diaz , Philip Harris , Raghav Kansal , Daniel S. Katz , Ishaan H. Kavoori

分类：机器学习

2022-12-09

The findable, accessible, interoperable, and reusable (FAIR) data principles have provided a framework for examining, evaluating, and improving how we share data with the aim of facilitating scientific discovery. Efforts have been made to generalize these principles to research software and other digital products. Artificial intelligence (AI) models -- algorithms that have been trained on data rather than explicitly programmed -- are an important target for this because of the ever-increasing pace with which AI is transforming scientific and engineering domains. In this paper, we propose a practical definition of FAIR principles for AI models and create a FAIR AI project template that promotes adherence to these principles. We demonstrate how to implement these principles using a concrete example from experimental high energy physics: a graph neural network for identifying Higgs bosons decaying to bottom quarks. We study the robustness of these FAIR AI models and their portability across hardware architectures and software frameworks, and report new insights on the interpretability of AI predictions by studying the interplay between FAIR datasets and AI models. Enabled by publishing FAIR AI models, these studies pave the way toward reliable and automated AI-driven scientific discovery.

translated by 谷歌翻译

Formulation of problems of combinatorial optimization for solving problems of management and planning of cloud production

M. V. Saramud , E. A. Spirin , E. P. Talay , I. I. Pikalov

分类：机器人

2022-12-05

The application of combinatorial optimization problems to solving the problems of planning processes for industries based on a fund of reconfigurable production resources is considered. The results of their solution by mixed integer programming methods are presented.

translated by 谷歌翻译

A PM2.5 concentration prediction framework with vehicle tracking system: From cause to effect

Chuong D. Le , Hoang V. Pham , Duy A. Pham , An D. Le , Hien B. Vo

分类：计算机视觉

2022-12-04

Air pollution is an emerging problem that needs to be solved especially in developed and developing countries. In Vietnam, air pollution is also a concerning issue in big cities such as Hanoi and Ho Chi Minh cities where air pollution comes mostly from vehicles such as cars and motorbikes. In order to tackle the problem, the paper focuses on developing a solution that can estimate the emitted PM2.5 pollutants by counting the number of vehicles in the traffic. We first investigated among the recent object detection models and developed our own traffic surveillance system. The observed traffic density showed a similar trend to the measured PM2.5 with a certain lagging in time, suggesting a relation between traffic density and PM2.5. We further express this relationship with a mathematical model which can estimate the PM2.5 value based on the observed traffic density. The estimated result showed a great correlation with the measured PM2.5 plots in the urban area context.

translated by 谷歌翻译

DimenFix: A novel meta-dimensionality reduction method for feature preservation

Qiaodan Luo , Leonardo Christino , Fernando V Paulovich , Evangelos Milios

分类：机器学习

2022-11-30

Dimensionality reduction has become an important research topic as demand for interpreting high-dimensional datasets has been increasing rapidly in recent years. There have been many dimensionality reduction methods with good performance in preserving the overall relationship among data points when mapping them to a lower-dimensional space. However, these existing methods fail to incorporate the difference in importance among features. To address this problem, we propose a novel meta-method, DimenFix, which can be operated upon any base dimensionality reduction method that involves a gradient-descent-like process. By allowing users to define the importance of different features, which is considered in dimensionality reduction, DimenFix creates new possibilities to visualize and understand a given dataset. Meanwhile, DimenFix does not increase the time cost or reduce the quality of dimensionality reduction with respect to the base dimensionality reduction used.

translated by 谷歌翻译